| (1) | |
|---|---|
| (Intercept) | 4554.802 *** |
| (408.046) | |
| treat | 1794.343 ** |
| (632.854) | |
| N | 445 |
| R2 | 0.018 |
| logLik | -4542.741 |
| AIC | 9091.482 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
| vs experimental control | vs population | |
|---|---|---|
| (Intercept) | 4554.802 *** | 6984.170 *** |
| (408.046) | (360.710) | |
| treat | 1794.343 ** | -635.026 |
| (632.854) | (657.137) | |
| N | 445 | 614 |
| R2 | 0.018 | 0.002 |
| logLik | -4542.741 | -6346.371 |
| AIC | 9091.482 | 12698.742 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | ||
(he tried a bunch of stuff, none of it worked)
Candidates for job training were unlike the general population. They weren’t even especially similar to people with similar income, education, age, race, etc.
The treatment group differed from the general population in their expected potential income. So a naive difference of means would be a combination of the impact of job training and the difference in potential outcomes with the general population.
\[\underbrace{E[Y_i(1) - Y_i(0)|D_i=1]}_\text{job training effect} + \underbrace{E[Y_i(0)|D_i=1] - E[Y_i(0)|D_i=0]}_\text{differences between applicants and non-applicants}\]
| Variable | N | population, N = 429 | control, N = 260 | treated, N = 185 |
|---|---|---|---|---|
| age | 874 | 25 (19, 35) | 24 (19, 28) | 25 (20, 29) |
| educ | 874 | 11 (9, 12) | 10 (9, 11) | 11 (9, 12) |
| race | 874 | |||
| black | 87 (20%) | 215 (83%) | 156 (84%) | |
| hispan | 61 (14%) | 28 (11%) | 11 (5.9%) | |
| white | 281 (66%) | 17 (6.5%) | 18 (9.7%) | |
| married | 874 | 220 (51%) | 40 (15%) | 35 (19%) |
| nodegree | 874 | 256 (60%) | 217 (83%) | 131 (71%) |
| income 1974 | 874 | 2,547 (0, 9,277) | 0 (0, 139) | 0 (0, 1,291) |
| income 1975 | 874 | 1,087 (0, 3,881) | 0 (0, 650) | 0 (0, 1,817) |
| 1 Median (IQR); n (%) |
Control variables probably can’t fix this.
Applied for program \(\rightarrow\) Earnings
Applied for program \(\leftarrow\) Education \(\rightarrow\) Earnings
dataFunction<-function(){
N<-1000
education<-rpois(N, 3) # education
D<-rbinom(N, 1, arm::invlogit(education)) # applied to job program
Y_1<- 2 + education^2 + rnorm(100) # potential outcome for training
Y_0<- 0 + education^2 + rnorm(100) # potential outcome for no training
ATE<-mean(Y_1-Y_0) # the true effect
Y<-ifelse(D==1, Y_1, Y_0) # the observed outcomes
df<-tibble("income" =Y, Y_0=Y_0, Y_1=Y_1,
'treated' = factor(D, labels=c("Control", "Treated")),
education)
return(df)
}
set.seed(9999)
data<-dataFunction()
ggplot(data, aes(x=income,y=treated)) +
stat_halfeye() +
theme_bw() +
xlab("Potential Outcome if Y(1)")[1] 1.870983
Notably, since the education confounder has a non-linear effect on income, we don’t actually get a correct estimate even if we control for it:
| no controls | ols + control | |
|---|---|---|
| actual ATE = 1.871 | ||
| (Intercept) | 2.721 | -5.448 |
| (1.187) | (0.411) | |
| treatedTreated | 12.322 | -2.020 |
| (1.259) | (0.455) | |
| education | 7.084 | |
| (0.080) | ||
| Num.Obs. | 1000 | 1000 |
The experimental ideal allows us to assume this:
\[Y_i(0), Y_i(1), X_i \mathrel{\unicode{x2AEB}} D_i\] But we could theoretically get the average treatment effect if we can assume that the assignment is conditionally independent of expected outcomes.
\[Y_i(0), Y_i(1) \mathrel{\unicode{x2AEB}} D_i | X_i\]Both matching and regression methods attempt to meet this more relaxed condition, but the former can potentially do so more flexibly.
Exact matching on education, however, actually does get us much closer to the true average treatment effect
ate<-mean(data$Y_1 - data$Y_0)
matched<-matchit(treated ~ education ,data=data,
method='exact',
estimand ='ATE'
)
mdat<-match.data(matched)
exact_match_fit <- lm(income ~ treated + education,
data = mdat, weights = weights)
modelsummary(list('no controls'=ols,
'ols + control' =ols_control,
'exact matching' =exact_match_fit),
gof_map = c("nobs"),
note = sprintf("actual ATE = %s", round(ate,digits=3))
)| no controls | ols + control | exact matching | |
|---|---|---|---|
| actual ATE = 1.871 | |||
| (Intercept) | 2.721 | -5.448 | -5.446 |
| (1.187) | (0.411) | (0.335) | |
| treatedTreated | 12.322 | -2.020 | 1.879 |
| (1.259) | (0.455) | (0.304) | |
| education | 7.084 | 5.546 | |
| (0.080) | (0.071) | ||
| Num.Obs. | 1000 | 1000 | 862 |
In repeated simulations, the exact matching estimates are unbiased and have only slightly more variance than the actual average treatment effect:
set.seed(999)
fun<-function(){
df <- dataFunction() # re-using the data generating function
ATE <- mean(df$Y_1 - df$Y_0)
model<-lm(income ~ education + treated, data=df)
matched<-matchit(treated ~ education ,data=df, method='exact' ,estimand='ATE')
mdat<-match.data(matched)
fit1 <- lm(income ~ treated ,
data = mdat, weights = weights)
res<-tibble('regression estimate' = coef(model)[3],
#'matched' = mean(result$difference),
"matched estimate"=coef(fit1)[2],
"Average Treatment Effect" = ATE
)
return(res)
}
results<-replicate(500, fun(), simplify = F)|>
bind_rows()|>
pivot_longer(cols=everything())|>
mutate(value = unlist(value))In practice, exact matching is not feasible for most data. Most real world data sets will have few, if any, exact matches on multiple characteristics. (think about trying to do this with a continuous covariate!)
Instead of matching on values, PSM matches observations with similar probabilities of receiving treatment.
matched<-matchit(treated ~ education ,data=df,
method='nearest' ,
ratio = 2,
distance = 'glm')
mdat<-match.data(matched)
psm_fit <- lm(income ~ treated + education , data=mdat, weights = weights)
modelsummary(list('ols + control' =ols_control,
"exact matching"=exact_match_fit,
"psm" = psm_fit
),
gof_map = c("nobs")
)| ols + control | exact matching | psm | |
|---|---|---|---|
| (Intercept) | -5.448 | -5.446 | -6.542 |
| (0.411) | (0.335) | (0.660) | |
| treatedTreated | -2.020 | 1.879 | -1.649 |
| (0.455) | (0.304) | (1.966) | |
| education | 7.084 | 5.546 | 8.033 |
| (0.080) | (0.071) | (0.357) | |
| Num.Obs. | 1000 | 862 | 222 |
In practice propensity scores are often completely fine, but theoretically they can actually worsen bias if the treatment selection process isn’t modeled correctly. Other methods are more robust to this kind of mis-specification.
matched<-matchit(treated ~ education ,data=df, method='cem' ,
distance = 'glm', estimand = "ATE")
mdat<-match.data(matched)
cem_fit <- lm(income ~ treated + education , data=mdat, weights = weights)
modelsummary(list('ols + control' =ols_control,
"exact matching"=exact_match_fit,
"psm" = psm_fit,
"cem" = cem_fit
),
gof_map = c("nobs"),
note = sprintf("actual ATE = %s", round(ate,digits=3))
)| ols + control | exact matching | psm | cem | |
|---|---|---|---|---|
| actual ATE = 1.871 | ||||
| (Intercept) | -5.448 | -5.446 | -6.542 | -5.446 |
| (0.411) | (0.335) | (0.660) | (0.335) | |
| treatedTreated | -2.020 | 1.879 | -1.649 | 1.879 |
| (0.455) | (0.304) | (1.966) | (0.304) | |
| education | 7.084 | 5.546 | 8.033 | 5.546 |
| (0.080) | (0.071) | (0.357) | (0.071) | |
| Num.Obs. | 1000 | 862 | 222 | 862 |
Methods like CEM will generally group variables together before weighting, so we would expect to see clustering within the strata, but we know how to deal with this!
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
1.88 0.157 12 <0.001 107.5 1.57 2.19
Term: treated
Type: response
Comparison: Treated - Control
Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted
Matching can address some of the limitations of regression models and simulate experimental conditions.
Main advantages over regression are: